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Abstract 

In this paper we formalize the notions of information elements and information lattices, first 
proposed by Shannon. Exploiting this formalization, we identify a comprehensive parallelism between 
information lattices and subgroup lattices. Qualitatively, we demonstrate isomorphisms between informa- 
tion lattices and subgroup lattices. Quantitatively, we establish a decisive approximation relation between 
the entropy structures of information lattices and the log-index structures of the corresponding subgroup 
lattices. This approximation extends the approximation for joint entropies carried out previously by Chan 
and Yeung. As a consequence of our approximation result, we show that any continuous law holds in 
general for the entropies of information elements if and only if the same law holds in general for 
the log-indices of subgroups. As an application, by constructing subgroup counterexamples we find 
surprisingly that common information, unlike joint information, obeys neither the submodularity nor the 
supermodularity law. We emphasize that the notion of information elements is conceptually significant — 
formalizing it helps to reveal the deep connection between information theory and group theory. The 
parallelism established in this paper admits an appealing group-action explanation and provides useful 
insights into the intrinsic structure among information elements from a group-theoretic perspective. 
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I. Introduction 

Information theory was born with the celebrated entropy formula measuring the amount 
of information for the purpose of communication. However, a suitable mathematical model 
for information itself remained elusive over the last sixty years. It is reasonable to assume 
that information theorists have had certain intuitive conceptions of information, but in this 
paper we seek a mathematic model for such a conception. In particular, building on Shannon's 
work [1], we formalize the notion of information elements to capture the syntactical essence 
of information, and identify information elements with cr-algebras and sample-space-partitions. 
As we shall see in the following, by building such a mathematical model for information and 
identifying the lattice structure among information elements, the seemingly surprising connection 
between information theory and group theory, established by Chan and Yeung [2], is revealed 
via isomorphism relations between information lattices and subgroup lattices. Consequently, a 
fully-fledged and decisive approximation relation between the entropy structure of information 
lattices and the subgroup-index structure of corresponding subgroup lattices is obtained. 

We first motivate our formal definition for the notion of information elements. 

A. Informationally Equivalent Random Variables 

Recall the profound insight offered by Shannon [3] on the essence of communication: "the 
fundamental problem of communication is that of reproducing at one point exactly or approxi- 
mately a message selected at another point." Consider the following motivating example. Suppose 
a message, in English, is delivered from person A to person B. Then, the message is translated 
and delivered in German by person B to person C (perhaps because person C does not know 
English). Assuming the translation is faithful, person C should receive the message that person A 
intends to convey. Reflecting upon this example, we see that the message (information) assumes 
two different "representations" over the process of the entire communication — one in English and 
the other in German, but the message (information) itself remains the same. Similarly, coders 
(decoders), essential components of communication systems, perform the similar function of 
"translating" one representation of the same information to another one. This suggests that 
"information" itself should be defined in a translation invariant way. This "translation-invariant" 
quality is precisely how we seek to characterize information. 
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To introduce our formal definition for information elements to capture the essence of infor- 
mation itself, we note that information theory is built within the probabilistic framework, in 
which one-time information sources are usually modeled by random variables. Therefore, we 
start in the following with the concept of informational equivalence between random variables 
and develop the formal concept of information elements from first principles. 

Recall that, given a probability space (fi, .F, P) and a measurable space (S,«S), a random 
variable is a measurable function from Vt to S. The set S is usually called the state space of the 
random variable, and S is a c-algebra on S. The set Vt is usually called the sample space; T is 
a ex-algebra on £7, usually called the event space; and P denotes a probability measure on the 
measurable space (fi,jF). 

To illustrate the idea of informational equivalence, consider a random variable X : VI — > S 
and another random variable X' = f(X), where the function / : S — > S' is bijective. Certainly, 
the two random variables X and X' are technically different for they have different codomains. 
However, it is intuitively clear that that they are "equivalent" in some sense. In particular, one 
can infer the exact state of X by observing that of X', and vice versa. For this reason, we may 
say that the two random variables X and X' carry the same piece of information. Note that the 
(j-algebras induced by X and X' coincide with each other. In fact, two random variables such 
that the state of one can be inferred from that of the other induce the same a-algebra. This leads 
to the following definition for information equivalence. 

Definition 1: We say that two random variables X and X' are informationally equivalent, 
denoted X = X', if the cr-algebras induced by X and X' coincide. 

It is easy to verify that the "being-informational-equivalent" relation is an equivalence relation. 
The definition reflects our intuition, as demonstrate in the previous motivating examples, that two 
random variables carry the same piece information if and only if they induce the same cr-algebra. 
This motivates the following definition for information elements to capture the syntactical essence 
of information itself. 

Definition 2: An information element is an equivalence class of random variables with respect 
to the "being-informationally-equivalent" relation. 

We call the random variables in the equivalent class of an information element m representing 
random variables of m. Or, we say that a random variable X represents m. 

We believe that our definition of information elements reflects exactly Shannon's original 
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intention [1]: 

Thus we are led to define the actual information of a stochastic process as that which 
is common to all stochastic processes which may be obtained from the original by 
reversible encoding operations. 
Intuitive (also informal) discussion on identifying "information" with a-algebras surfaces often 
in probability theory, martingale theory, and mathematical finance. In probability theory, see for 
example [4], the concept of conditional probability is usually introduced with discussion of 
treating the a-algebras conditioned on as the "partial information" available to "observers." In 
martingale theory and mathematical finance, see for example [5], [6], filtrations — increasing 
sequences of cr-algebras — are often interpreted as records of the information available over time. 
1) A Few Observations: 

Proposition 1: If X = X', then H(X) = H(X'). 

(Throughout the paper, we use H(X) to denote the entropy of random variable X.) 

The conserve to Proposition \T\ fails — two random variables with a same entropy do not 
necessarily carry the same information. For example, consider two binary random variables 
X, Y : f2 — > {0, 1}, where = {a, b, c, d} and P is uniform on £1. Suppose X(u) = if to = a, b 
and 1 otherwise, and Y(uj) = if to = a, c and 1 otherwise. Clearly, we have H(X) = H(Y) = 1, 
but one can readily agree that X and Y do not carry the same information. Therefore, the notion 
of "informationally-equivalent" is stronger than that of "identically-distributed." 

On the other hand, we see that the notion of "informationally-equivalent" is weaker than that 
of "being-equal." 

Proposition 2: If X = X', then X = X'. 

The converse to Proposition [2] fails as well, since two informationally equivalent random 
variable X and X' may have totally different state spaces, so that it does not even make sense 
to say X = X'. 

As shown in the following proposition, the notion of "informational equivalence" characterizes 
a kind of state space invariant "equalness." 

Proposition 3: Two random variables X and Y with state spaces X and y, respectively, are 
informationally equivalent if and only if there exists a one-to-one correspondence / : X — > y 
such that Y = f(X). 

Remark: Throughout the paper, we fix a probability space unless otherwise stated. For ease of 
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presentation, we confine ourselves in the following to finite discrete random variables. However, 
most of the definitions and results can be applied to more general settings without significant 
difficulties. 

B. Identifying Information Elements via a-algebras and Sample-Space-Partitions 

Since the a-algebras induced by informationally equivalent random variables are the same, 
we can unambiguously identify information elements with a-algebras. Moreover, because we 
deal with finite discrete random variables exclusively in this paper, we can afford to discuss 
a- algebras more explicitly as follows. 

Recall that a partition II of a set A is a collection {ni : i E [k]} of disjoint subsets of A such 
that Uj e [fc]7Ti = A. (Throughout the paper, we use the bracket notation [k] to denote the generic 
index set {1, 2, • • • , k}.) The elements of a partition II are usually called the parts of IT. It is 
well known that there is a natural one-to-one correspondence between partitions of the sample 
space and the a-algebras — any given a-algebra of a sample space can be generated uniquely, 
via union operation, from the atomic events of the a-algebra, while the collection of the atomic 
events forms a partition of the sample space. For example, for a random variable X : f2 — > X, 
the atomic events of the a-algebra induced by X are X _1 ({x}),a; G X. For this reason, from 
now on, we shall identify an information element by either its a-algebra or its corresponding 
sample space partition. 

It is well known that the number of distinct partitions of a set of size n is the nth Bell number 
and that the Stirling number of the second kind S(n, k) counts the number of ways to partition 
a set of n elements into k nonempty parts. These two numbers, crucial to the remarkable results 
obtained by Orlitsky et al. in [7], suggest a possibly interesting connection between the notion 
of information elements discussed in this paper and the "patterns" studied in [7]. 

C. Shannon's Legacy 

As we mentioned before, the notion of information elements was originally proposed by 
Shannon in [1]. In the same paper, Shannon also proposed a partial order for information elements 
and a lattice structure for collections of information elements. We follow Shannon and call such 
lattices information lattices in the following. 
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Abstracting the notion of information elements out of their representations — random variables — 
is a conceptual leap, analogous to the leap from the concrete calculation with matrices to the 
study of abstract vector spaces. To this end, we formalize both the ideas of information elements 
and information lattices. By identifying information elements with sample-space-partitions, we 
are equipped to establish a comprehensive parallelism between information lattices and sub- 
group lattices. Qualitatively, we demonstrate isomorphisms between information lattices and 
certain subgroup lattices. With such isomorphisms established, quantitatively, we establish an 
approximation for the entropy structure of information lattices, consisting of joint, common, and 
many other information elements, using the log-index structures of their counterpart subgroup 
lattices. Our approximation subsumes the approximation carried out only for joint information 
elements by Chan and Yeung [2]. Building on [2], the parallelism identified in this paper reveals 
an intimate connection between information theory and group theory and suggests that group 
theory may provide suitable mathematical language to describe and study laws of information. 

The full-fledged parallelism between information lattices and subgroup lattices established 
in paper is one of our main contributions. With this intrinsic mathematical structure among 
multiple information elements being uncovered, we anticipate more systematic attacks on certain 
network information problems, where a better understanding of intricate internal structures among 
multiple information elements is in urgent need. Indeed, the ideas of information elements and 
information lattices were originally motivated by network communication problems — in [1], 
Shannon wrote: 

The present note outlines a new approach to information theory which is aimed 
specifically at the analysis of certain communication problems in which there exist 
a number of sources simultaneously in operation. 

and 

Another more general problem is that of a communication system consisting of a large 
number of transmitting and receiving points with some type of interconnecting network 
between the various points. The problem here is to formulate the best system design 
whereby, in some sense, the best overall use of the available facilities is made. 

It is not hard to see that Shannon was attempting to solve now-well-known network coding 

capacity problems. 
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Certainly, we do not claim that all the ideas in this paper are our own. For example, as 
we pointed out previously, the notions of information elements and information lattices were 
proposed as early as the 1950s by Shannon [1]. However, this paper of Shannon's is not 
well recognized, perhaps owing to the abstruseness of the ideas. Formalizing these ideas and 
connecting them to current research is one of the primary goals of this paper. For all other 
results and ideas that have been previously published, we separate them from those of our own 
by giving detailed references to their original sources. 

D. Organization 

The paper is organized as follows. In Section HH we introduce a "being-richer-than" partial 
order between information elements and study the information lattices induced by this partial 
order. In Section Hill we formally establish isomorphisms between information lattices and 
subgroup lattices. Section [IV] is devoted to the quantitative aspects of information lattices. We 
show that the entropy structure of information lattices can be approximated by the log-index 
structure of their corresponding subgroup lattices. As a consequence of this approximation result, 
in Section |Vl we show that any continuous law holds for the entropies of common and joint 
information if and only if the same law holds for the log-indices of subgroups. As an application 
of this result, we show a result, which is rather surprising, that unlike joint information neither 
the submodularity nor the supermodularity law holds for common information in general. We 
conclude the paper with a discussion in Section |VIJ 

II. Information Lattices 
A. "Being-richer-than" Partial Order 

Recall that every information element can be identified with its corresponding sample- space- 
partition. Consider two sample- space-partitions II and II'. We say that 11 is finer than IT', or 11' 
is coarser than II, if each part of II is contained in some part of II'. 

Definition 3: For two information elements m\ and m 2 , we say that m\ is richer than m 2 , 
or m 2 is poorer than m 2 , if the sample- space-partition of m\ is finer than that of m 2 . In this 
case, we write mi > m 2 . 

It is easy to verify that the above defined "being-richer-than" relation is a partial order. 

We have the following immediate observations: 
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Proposition 4: mi > m 2 if and only if H(m 2 \mi) = 0. 

As a corollary to the above proposition, we have 

Proposition 5: If mi > m 2 , then H(mi) > H(m 2 ). 
The converse of Proposition \5\ does not hold in general. 

With respect to representative random variables of information elements, we have 

Proposition 6: Suppose random variables Xi and X 2 represent information elements mi and 
m 2 respectively. Then, m x > m 2 if and only if X 2 = f(Xi) for some function /. 

A similar result to Proposition [6] was previously observed by Renyi [8] as well. 

The "being-richer-than" relation is very important to information theory, because it character- 
izes the only universal information-theoretic constraint put on all deterministic coders (decoders) — 
the input information element of any coder is always richer than the output information element. 
For example, partially via this principle, Yan et al. recently characterized the capacity region of 
general acyclic multi-source multi-sink networks [9]. Harvey et al. [10] obtained an improved 
computable outer bound for general network coding capacity regions by applying this same 
principle under a different name called information dominance — the authors of the paper ac- 
knowledged: "...information dominance plays a key role in our investigation of network capacity." 

B. Information Lattices 

Recall that a lattice is a set endowed with a partial order in which any two elements have 
a unique supremum and a unique infimum with respect to the partial order. Conventionally, 
the supremum of two lattice elements x and y is also called the join of x and y; the infimum 
is also called the meet. In our case, with respect to the "being-richer-than" partial order, the 
supremum of two information elements mi and m%, denoted mi Vm 2 , is the poorest among all 
the information elements that are richer than both mi and m 2 . Conversely, the infimum of mi 
and m 2 , denoted mi A m 2 , is the richest among all the information elements that are poorer than 
both mi and m 2 . In the following, we also use m 12 to denote the join of mi and m 2 , and m\ 2 
the meet. 

Definition 4: An information lattice is a set of information elements that is closed under the 
join V and meet A operations. 

Recall the one-to-one correspondence between information elements and sample- space-partitions. 
Consequently, each information lattice corresponds to a partition lattice (with respect to the 
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"being-finer-than" partial order on partitions), and vice versa. This formally confirms the as- 
sertions made in [1]: "they (information lattices) are at least as general as the class of finite 
partition lattices." 

Since the collection of information lattices could be as general as that of partition lattices, 
we should not expect any special lattice properties to hold generally for all information lattices, 
because it is well-known that any finite lattice can be embedded in a finite partition lattice [11]. 
Therefore, it is not surprising to learn that information lattices are in general not distributive, 
not even modular. 

C. Joint Information Element 

The join of two information elements is straightforward. Consider two information elements 
mi and m 2 represented respectively by two random variables X 1 and X 2 . It is easy to check 
that the joint random variable (Xi,X 2 ) represents the join m 12 . For this reason, we also call 
m 12 (or mi V m 2 ) the joint information element of mi and m 2 . It is worth pointing out that the 
joint random variable (X 2: X 1 ) represents m 12 equally well. 

D. Common Information Element 

In [1], the meet of two information elements is called common information. More than 
twenties years later, the same notion of common information was independently proposed and 
first studied in detail by Gacs and Korner [12]. For the first time, it was demonstrated that 
common information could be far less than mutual information. ("Mutual information" is rather 
a misnomer because it does not correspond naturally to any information element [12].) Unlike 
the case of joint information elements, characterizing common information element via their 
representing random variables is much more complicated. See [12], [13] for details. 

In contrast to the all-familiar joint information, common information receives far less atten- 
tion. Nonetheless, it has been shown to be important to cryptography [14], [15], [16], [17], 
indispensable for characterizing of the capacity region of multi-access channels with correlated 
sources [18], useful in studying information inequalities [19], [20], and relevant to network 
coding problems [21]. 
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E. Previously Studied Lattices in Information Theory 

Historically, at least three other lattices [22], [23], [24] have been considered in attempts 
to characterize certain ordering relations between information elements. Two of them, studied 
respectively in [22] and [24], are subsumed by the information lattices considered in this paper. 

III. Isomorphisms between Information Lattices and Subgroup Lattices 

In this section, we discuss the qualitative aspects of the parallelism between information 
lattices generated from sets of information elements and subgroup lattices generated from sets 
of subgroups. In particularly, we establish isomorphism relations between them. 

A. Information Lattices Generated by Information Element Sets 

It is easy to verify that both the binary operations "V" and "A" are associative and commutative. 
Thus, we can readily extend them to cases of more than two information elements. Accordingly, 
for a given set {m 8 : i E [n]} of information elements, we denote the joint information element 
of the subset {m; : i E a}, a C [n], of information elements by m a and the common information 
element by m a . 

Definition 5: Given a set M = {rrii : i E [n] } of information elements, the information lattice 
generated by M, denoted L M , is the smallest information lattice that contains M. We call M 
the generating set of the lattice Lm- 

It is easy to see that each information element in Lm can be obtained from the information 
elements in the generating set M via a sequence of join and meet operations. Note that the set 
{m a : a C [n]} of information elements forms a meet semi-lattice and the set \mP : (5 C [n]} 
forms a join semi-lattice. However, the union {m a ^mP : a,/3 C [n]} of these two semi-lattices 
does not necessarily form a lattice. To see this, consider the following example constructed with 
partitions (since partitions are in one-to-one correspondence with information elements). Let 
{iii :%— [4]} be a collection of partitions on the set {1, 2, 3, 4} where m = 12|3|4, 7r 2 = 14|2|3, 
7r 3 = 23 1 1 1 4, and 7r 4 = 34|1|2. See Figured] for the Hasse diagram of the lattice generated by 
the collection {m : i = [4]}. It is easy to see (tti V n 2 ) A(n 3 V 7r 4 ) = 124|3 A 234| 1 = 24|1|3, 
but 24|1|3 ^ {7r a ,7r^ : a, j3 E [4]}. Similarly, we have (m V tt 3 ) A(tt 2 V tt 4 ) = 13|2|4 ^ {ir ai Tr p : 
a,/3e [4]}. 
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Fig. 1. Lattice Generated by {m : i = [4]} 

B. Subgroup Lattices 

Consider the binary operations on subgroups — intersection and union. We know that the 
intersection Gi n G 2 of two subgroups is again a subgroup. However, the union Gi U G 2 does 
not necessarily form a subgroup. Therefore, we consider the subgroup generated from the union 
Gi U G 2 , denoted G 12 (or Gi V C7 2 ). Similar to the case of information elements, the intersection 
and "V" operations on subgroups are both associative and commutative. Therefore, we readily 
extend the two operations to the cases with more than two subgroups and, accordingly, denote 
the intersection r\ ie [ n ]Gi of a set of subgroups {Gi : % e [n]} by Gy n \ and the subgroup generated 
from the union by G^ n \ It is easy to verify that the subgroups G\ n \ and G^ are the infimum and 
the supremum of the set {Gi : % e [n]} with respect to the "being-a-subgroup-of" partial order. 
For notation consistency, we also use "A" to denote the intersection operation. 

Note that, to keep the notation simple, we "overload" the symbols "V" and "A" for both the 
join and the meet operations with information elements and the intersection and the "union- 
generating" operations with subgroups. Their actual meaning should be clear within context. 

Definition 6: A subgroup lattice is a set of subgroups that is closed under the A and V 
operations. 

For example, the set of all the subgroups of a group forms a lattice. 

Similar to the case of information lattices generated by sets of information elements, we 
consider in the following subgroup lattices generated by a set of subgroups. 

Definition 7: Given a set G = {Gi : i G [n]} of subgroups, the subgroup lattice generated by 
G, denoted L G , is the smallest lattices that contains G. We call G the generating set of L G . 
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Note that the set {G a : a C [n]} forms a semilattice under the meet A operation and the 
set {G 13 : P C [n]} forms a semilattice under the join V operation. However, as in the case of 
information lattices, the union {G a , G 13 : a, (3 C [n]} of the two semilattices does not necessarily 
form a lattice. 

In the remainder of this section, we relate information lattices generated by sets of information 
elements and subgroup lattices generated by collections of subgroups and demonstrate isomor- 
phism relations between them. For ease of presentation, as a special case we first introduce 
an isomorphism between information lattices generated by sets of coset-partition information 
elements and their corresponding subgroup lattices. 

C. Special Isomorphism Theorem 

We endow the sample space with a group structure — the sample space in question is taken to 
be a group G. For any subgroup of G, by Lagarange's theorem [25], the collection of its cosets 
forms a partition of G. Certainly, the coset-partition, as a sample- space-partition, uniquely defines 
an information element. A collection G = {Gi : i e [n]} of subgroups of G, in the same spirit, 
identifies a set M = {rrii : i £ [n]} of information elements via this subgroup-coset-partition 
correspondence. 

Remark: throughout the paper, groups are taken to be multiplicative, and cosets are taken to 
be right cosets. 

It is clear that, by our construction, the information elements in M and the subgroups in G 
are in one-to-one correspondence via the subgroup-coset-partition relation. It turns out that the 
information elements on the entire information lattice L M and the subgroups on the subgroup 
lattice L G are in one-to-one correspondence as well via the same subgroup-coset-partition 
relation. In other words, both the join and meet operations on information lattices are faithfully 
"mirrored" by the join and meet operations on subgroup lattices. 

Theorem 1: (Special Isomorphism Theorem) Given a set G = {Gi : i E [n]} of subgroups, 
the subgroup lattice L G is isomorphic to the information lattice L M generated by the set M = 
{rrii : % e [n]} of information elements, where m 8 , % e [n], are accordingly identified via the 
coset-partitions of the subgroups Gi, i £ [n]. 

The theorem is shown by demonstrating a mapping, from the subgroup lattice L G to the 
information lattice L M , such that it is a lattice-morphism, i.e., it honors both join and meet 
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operations, and is bijective as well. Naturally, the mapping : L G — ► L M assigning to each 
subgroup Gi E Lg the information element identified by the coset-partition of the subgroup Gi 
is such a morphism. Since this theorem and its general version, Theorem [2l are crucial to our 
later results — Theorems [3] and [5] — and certain aspects of the reasoning are novel, we include a 
detailed proof for it in Appendix HI 

D. General Isomorphism Theorem 

The information lattices considered in Section ITlI-CI is rather limited — by Lagrange's theorem, 
coset-partitions are all equal partitions. In this subsection, we consider arbitrary information 
lattices — we do not require the sample space to be a group. Instead, we treat a general sample- 
space-partition as an orbit-partition resulting from some group-action on the sample space. 

1 ) Group-Actions and Permutation Groups: 

Definition 8: Given a group G and a set A, a group-action of G on A is a function (g, a) i— > 
g(a), g E G, a E A, that satisfies the following two conditions: 

. (</i0 2 )(a) = (51(02(0)) for all 5-1,5-2 G G and a E A; 

• e(a) = a for all a E A, where e is the identity of G. 
We write (G, A) to denote the group-action. 

Now, we turn to the notions of orbits and orbit-partitions. We shall see that every group-action 
(G, A) induces unambiguously an equivalence relation as follows. We say that x\ and x 2 are 
connected under a group-action (G,A) if there exists a g E G such that £2 = g{xi). We write 

G G 

x\ ^ %2- It is easy to check that this "being-connected" relation ~ is an equivalence relation on 
A. By the fundamental theorem of equivalence relations, it defines a partition on A. 

Definition 9: Given a group-action (G, A), we call the equivalence classes with respect to the 
equivalence relation ~, or the parts of the induced partition of A, the orbits of the group-action. 
Accordingly, we call the induced partition the orbit-partition of (G, A) . 

2) Sample-Space-Partition as Orbit-Partition: In fact, starting with a partition II of a set A, 
we can go in the other direction and unambiguously define a group action (G, A) such that the 
orbit-partition of (G, A) is exactly the given partition II. To see this, note the following salient 
feature of group-actions: For any given group-action (G, A), associated with every element g in 
the group is a mapping from A to itself and any such mappings must be bijective. This feature 
is the direct consequence of the group axioms. To see this, note that every group element g 
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has a unique inverse g' 1 . According to the first defining property of group-actions, we have 
{99 1 ){ X ) — 9(9 1 ( x )) — e ( x ) — x f° r ai l x ^ A. This requires that the mappings associated 
with g and g^ 1 to be invertible. Clearly, the identity e of the group corresponds to the identity 
map from A to A. 

With the observation that under group-action (G, A) every group element corresponds to a 
permutation of A, we can treat every group as a collection of permutations that is closed under 
permutation composition. Specifically, for a given partition II of a set A, it is easy to check that all 
the permutations of A that permute the elements of the parts of II only to the elements of the same 
parts form a group. These permutations altogether form the so-called permutation representation 
of G (with respect to A). For this reason in the following, without loss of generality, we treat 
all groups as permutation groups. We denote by G n the permutation group corresponding as 
above to a partition II — Gu acts naturally on the set A by permutation, and the orbit partition 
of (G U ,A) is exactly IT. 

From group theory, we know that this orbit-partition-permutation-group-action relation is 
a one-to-one correspondence. Since every information element corresponds definitively to a 
sample- space-partition, we can identify every information element by a permutation group. 
Given a set M = {m, : % E [n]} of information elements, denote the set of the corresponding 
permutation groups by G = {Gi : i E [n]}. Note that all the permutations in the permutation 
groups Gi, % E [n], are permutations of the same set, namely the sample space. Hence, all the 
permutation groups Gi, i E [n], are subgroups of the symmetric group S\q\, which has order 
2l n L Therefore, it makes sense to take intersection and union of groups from the collection G. 

3) From Coset-Partition to Orbit-Partition — From Equal Partition to General Partition: 
In fact, the previously studied coset-partitions are a special kind of orbit-partitions. They are 
orbit-partitions of group-actions defined by the native group multiplication. Specifically, given a 
subgroup Gi of G, a group-action (G±, G) is defined such that gi(a) — g± o a for all g\ E G\ 
and a E G, where "o" denotes the native binary operation of the group G. The orbit-partition 
of such a group-action is exactly the coset-partition of the subgroup G\. Therefore, by taking 
a different kind of group-action — permutation rather than group multiplication — we are freed 
from the "equal-partition" restriction so that we can correspond arbitrary information elements 
identified with arbitrary sample- space-partitions to subgroups. It turns out information lattices 
generated by sets of information elements and subgroup lattices generated by the corresponding 
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sets of permutation groups remain isomorphic to each other. Thus, the isomorphism relation 
between information lattices and subgroup lattices holds in full generality. 

4) Isomorphism Relation Remains Between Information Lattices and Subgroup Lattices: 
Similar to Section UlI-Cl we consider a set M = {m^i E [n]} of information element. Unlike 
in Section IIII-CL the information elements m^, % E [n] considered here are arbitrary. As we 
discussed in the above, with each information element rrii we associate a permutation group 
Gi according to the orbit-partition-permutation-group-action correspondence. Denote the set of 
corresponding permutation groups by G = {Gi,i E [n]}. 

Theorem 2: (General Isomorphism Theorem) The information lattice Lm is isomorphic to the 
subgroup lattice L G - 

The arguments for Theorem [2] are similar to those for Theorem [j] — we demonstrate that the 
orbit-partition-permutation-group-action correspondence is a lattice isomorphism between L M 
and Lg- 



From this section on, we shift our focus to the quantitative aspects of the parallelism between 
information lattices and subgroup lattices. In the previous section, by generalizing from coset- 
partitions to orbit-partitions, we successfully established an isomorphism between general infor- 
mation lattices and subgroup lattices. In this section, we shall see that not only is the qualitative 
structure preserved, but also the quantitative structure — the entropy structure of information 
lattices — is essentially captured by their isomorphic subgroup lattices. 

A. Entropies of Coset-partition Information Elements 

We start with a simple and straightforward observation for the entropies of coset-partition 
information elements on information lattices. 

Proposition 7: Let {Gi : i 6 [n]} be a set of subgroups of group G and {rrii : i E [n]} 
be the set of corresponding coset-partition information elements. The entropies of the joint and 
common information elements on the information lattice, generated from {rrii '■ i E [n]}, can be 
calculated from the subgroup-lattice, generated from {Gi : i E [n]}, as follows 



IV. An Approximation Theorem 



h(m [n] ) = log 



G 



(1) 
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and 



h(m[ n ]) = log 



G 



(2) 




Proposition [7J follows easily from the isomorphism relation established by Theorem |2] 

Note that the right hand sides of both Equation (OQ) and © are the logarithms of the indices 
of subgroups. In the following, we shall call them, in short, log-indices. 

Proposition [7J establishes a quantitative relation between the entropies of the information 
elements on coset-partition information lattices and the log-indices of the subgroups on the 
isomorphic subgroup lattices. This quantitative relation is exact. However, the scope of Propo- 
sition |7J is rather restrictive — it applies only to certain special kind of "uniform" information 
elements, because, by Lagrange's theorem, all coset-partitions are equal partitions. 

In Section Unl by generalizing from coset-partitions to orbit-partitions we successfully removed 
the "uniformness" restriction imposed by the coset-partition structure. At the same time, we 
established a new isomorphism relation, namely orbit-partition-permutation-group-action corre- 
spondence, between information lattices and subgroup lattices. It turns out that this generalization 
maintains an "rough" version of the quantitative relation established in Proposition [7J between the 
entropies of information lattices and the log-indices of their isomorphic permutation-subgroup 
lattices. As we shall see in the next section, the entropies of the information elements on 
information lattices can be approximated, up to arbitrary precision, by the log-indices of the 
permutation groups on their isomorphic subgroup lattices. 

B. Subgroup Approximation Theorem 

To discuss the approximation formally, we introduce two definitions as follows. 
Definition 10: Given an information lattice Lm generated from a set M = {rrii,i E [n]} of 
information elements, we call the real vector 



whose components are the entropies of the information elements on the information lattice L M 
generated by M, listed according to a certain prescribed order, the entropy vector o/Lm, denoted 



The entropy vector h(L M ) captures the informational structure among the information elements 



\H(m) : m E L M 



) 



h(L M ). 



of M. 
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Definition 11: Given a subgroup lattice L G generated from a set G = {G^i £ [n]} of 
subgroups of a group C7, we call the real vector 



whose components are the normalized log-indices of the subgroups on the subgroup lattice Lq 
generated by G, listed according to a certain prescribed order, the normalized log-index vector 
ofL G , denoted l(L G ). 

In the following, we assume that Z(Lg) and /i(Lm) are accordingly aligned. 

Theorem 3: Let M = \vn u % £ [n]} be a set of information elements. For any e > there 
exists an iV > and a set G^ = {Gi : i £ [n]} of subgroups of the symmetry group Sn of 
order 2 N such that 



where "||-||" denotes the norm of real vectors. 

Theorem [3] subsumes the approximation carried out by Chan and Yeung in [2], which is limited 
to joint entropies. The approximation procedure we carried out to prove Theorem [3] is similar to 
that of Chan and Yeung [2] — both use Stirling's approximation formula for factorials. But, with 
the group-action relation between information elements and permutation groups being exposed, 
and the isomorphism between information lattices and subgroup lattices being revealed, the 
approximation procedure becomes transparent and the seemingly surprising connection between 
information theory and group theory becomes mathematically natural. For these reasons, we 
included a detailed proof in Appendix HD 

V. Parallelism between Continuous Laws of Information Elements and those 



As a consequence of Theorem [3l we shall see in the following that if a continuous law holds in 
general for information elements, then the same law must hold for the log-indices of subgroups, 
and vice versa. 

In the following, for reference and comparison purposes, we first review the known laws con- 
cerning the entropies of joint and common information elements. These laws, usually expressed 
in the form of information inequalities, are deemed to be fundamental to information theory [26]. 




||/i(L m )-Z(Lg*)|| <e, 



(3) 



of Subgroups 
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A. Laws for Information Elements 

1) Non-Negativity of Entropy: 

Proposition 8: For any information element m, we have H(m) > 0. 

2) Laws for Joint Information: 

Proposition 9: Given a set {m^i e [n]} of information elements, if a C (3, a, (3 C [n], then 
#(m Q ) < #(77^). 

Proposition 10: For any two sets of information elements {m; : ! 6 a} and {rrij : j G /?}, 
the following inequality holds: 

#(m a ) + H(m^) > H(m aU(3 ) + H(m anl3 ). 
This proposition is mathematically equivalent to the following one. 

Proposition 11: For any three information elements mi, m 2 , and m 3 , the following inequality 
holds: 

H{m 12 ) + if(m 23 ) > #(m 123 ) + H(m 3 ). 
Note that if(m 3 ) = H(m 3 ). 

Proposition [TO] (or equivalently ITTb is usually called the submodularity law for entropy func- 
tion. Proposition [HEB and [10] are known, collectively, as the polymatroidal axioms [27], [28]. Up 
until very recently, these are the only known laws for entropies of joint information elements. 

In 1998, Zhang and Yeung discovered a new information inequality, involving four information 
elements [28]. 

Proposition 12: (Zhang- Yeung Inequality) For any four information elements m», % = 1,2,3, 
and 4, the following inequality holds: 

3#(m 13 ) + 3H (m 14 ) + H(m 23 ) + H(m u ) + 3#(m 34 ) 

> Him 1 ) + 2H(m 3 ) + 2H(m 4 ) 

+ H(m 12 ) + AH(m lu ) + H(m 2U ). (4) 
This newly discovered inequality, classified as a non-Shannon type information inequality [26], 
proved that our understanding on laws governing the quantitative relations between information 
elements is incomplete. Recently, six more new four-variable information inequalities were 
discovered by Dougherty et al. [29]. 
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Information inequalities such as those presented above were called "laws of information" [26], 
[30]. Seeking new information inequalities is currently an active research topic [28], [19], [31], 
[32]. In fact, they should be more accurately called "laws of joint information", since these 
inequalities involves only joint information only. We shall see below laws involving common 
information. 

3) Common Information v.s. Mutual Information: In contrast to joint information, little re- 
search has been done to laws involving common information. So far, the only known non- 
trivial law involving both joint information and common information is stated in the following 
proposition, discovered by Gacs and Korner [12]. 

Proposition 13: For any two information element m\ and m 2 , the following inequality holds: 

H(m 12 ) < I (mr, m 2 ) = Him 1 ) + H(m 2 ) - H (m 12 ). 
Note that m 1 = nil and m 2 = m 2 . 

4) Laws for Common Information: Dual to the non-decreasing property of joint information, 
it is immediately clear that entropies of common information are non-increasing. 

Proposition 14: Given a set {m i; i e [n]} of information elements, if a C f3 a,/? C [n], then 
H(m a ) > H{m p ). 

Comparing to the case of joint information, one may naturally expect, as a dual counterpart 
of the submodularity law of joint information, a supermodularity law to hold for common 
information. In other words, we have the following conjecture. 

Conjecture 1: For any three information elements m\, m 2 , and m 3 , the following inequality 
holds: 

H(m 12 ) + H(m 23 ) < H(m 123 ) + H(m 2 ). (5) 
We see this conjecture as natural because of the intrinsic duality between the join and meet 
operations of information lattices. Due to the combinatorial nature of common information [12], 
it is not obvious whether the conjecture holds. With the help of our approximation results estab- 
lished in Theorem [3] and [51 we find, surprisingly, that neither the conjecture nor its converse holds. 
In other words, common information observes neither the submodularity nor the supermodularity 
law. 
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B. Continuous Laws for Joint and Common Information 

As a consequence of Theorem [3l we shall see in the following that if a continuous law holds 
for information elements, then the same law must hold for the log-indices of subgroups, and vice 
versa. To convey this idea, we first present the simpler case involving only joint and common 
information elements. To state our result formally, we first introduce two definitions. 

Definition 12: Given a set M = {m ; : i E [n]} of information elements, consider the 
collection M. = \m a ,vnP : a, (3 C [n]} of join and meet information elements generated from 
M. We call the real vector 

(H{m a ),H{mp) :a,/3C [ra],a,/^$), 

whose components are the entropies of the information elements of /A, the entropy vector of 
M, denoted by h M . 

Definition 13: Given a set G = {Gi : i £ [n]} of subgroups of a group G, consider the set 
Q = {G a , G 13 : a, (3 C [n]} of the subgroups generated from G. We call the real vector 

whose components are the normalized log-indices of the subgroups in Ai, the normalized log- 
index vector of Q, denoted by lg. 

In this context, we assume that the components of both lg and Hm are listed according to 
a common fixed order. Moreover, we note that both the vectors Iim and lg have dimension 
2«+i - n - 2. 

Theorem 4: Let / : K 2 " +1 - n - 2 R be a continuous function. Then, /(/ix) > holds for all 
sets M of n information elements if and only if f(lg) > holds for all sets G of n subgroups 
of any group. 

Theorem |4] is a special case of Theorem \5\ 

Theorem H] and its generalization — Theorem [5] — extend the result obtained by Chan and Yeung 
in [2] in the following two ways. First, Theorem 0] and [5] apply to all continuous laws, while 
only linear laws were considered in [2]. Even though so far we have not yet encountered any 
nonlinear law for entropies, it is highly plausible that nonlinear information laws may exist given 
the recent discovery that at least certain part of the boundary of the entropy cones involving at 
least four information elements are curved [33]. Second, our theorems encompass both common 
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information and joint information, while only joint entropies were considered in [2]. For example, 
laws such as Propositions [T3l and [141 cannot even be expressed in the setting of [2]. In fact, as 
we shall see later in Section IV-Dl the laws of common information depart from those of joint 
information very early — unlike joint information, which obeys the submodularity law, common 
information admits neither submodularity nor supermodularity. For these reasons, we believe 
that our extending the subgroup approximation to common information is of interest in its own 
right. 

C. Continuous Laws for General Lattice Information Elements 

In this section, we extend Theorem @] to all the information elements in information lattices, 
not limited to the "pure" joint and common information elements. In the following, we introduce 
some necessary machinery to formally present the result in full generality. 

Note that an element from the lattice generated from a set X has its expression built from 
the generating elements of the lattice in the similar way that terms are built from literals in 
mathematical logic. In particular, we define lattice-terms as follows: 

Definition 14: An expression E is called a lattice-term formed from a set X of literals if 
either E is a literal from X or E is formed from two lattice-terms with either the join or the 
meet symbols: E = xOPy, where x and y are lattice-terms and OP is either the join symbol 
V or the meet symbol A. 

Definition 15: Suppose that E^ i E [k], are lattice-terms generated from a literal set of size 
n: X = {xi, ■ ■ ■ , x n }. We call an expression of the form 

f(H(E 1 ), - • • , H(E k )), 

where / represents a function from R fe to R and H represents the entropy function, an n-variable 
generalized information expression. 

We evaluate an n- variable generalized information expression f(H(Ei), ■ ■ ■ , H(Ek)) against 
a set M = {rrii : i E [n]} of information elements by substituting x,- t with respectively, 
calculating the entropy of the information elements obtained by evaluating the lattice-terms Ei 
according to the semantics of the join and meet operations on information elements, and then 
obtaining the corresponding function value. We denote this value by 

/(#(£!),■ ■■ ,H(E k ))\ M . 
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Definition 16: If an n-variable generalized information expression f(^H(E 1 ), ■ ■ ■ ,H(E k )^ is 
evaluated non-negatively for any set of n information elements, i.e., 

f(H(E 1 ),--- ,H(E k ))\ M >0, for all M, 

then we call 

f(H(E 1 ),-.- ,H(E k )) >0 

an n-variable information law. 

Similar to generalized information expressions, we define generalized log-index expression as 
follows. 

Definition 17: we call an expression of the form 

/(L(£?0,... ,L(E k )), 

where / represents a function from R fc to R and L represents the normalized log-index function 
of subgroups, an n-variable generalized log-index expression. 

We evaluate an n-variable generalized log-index expression f(L(Ei),--- , L(E k )) against 
a set G = {Gi : i £ [n]} of subgroups of a group G by substituting x { with G; t respectively, 
calculating the log-index of the subgroups obtained by evaluating the lattice-terms Ei according to 
the semantics of the join and meet operations on subgroups, and then obtaining the corresponding 
function value. We denote this value by 

f{L(Ex), • • • , L(E k ))\ G . 

Definition 18: If an n-variable generalized log-index expression f(H(Ei),--- ,H(E k )^j is 
evaluated non-negatively for any set of n subgroups of any group, i.e., 

f(L(Ei), ■ ■ ■ , L(E k )) | G > 0, for all G, 

then we call 

f(L(E 1 ),-.- ,L(E k ))>0 

an n-variable subgroup log-index law. 

With the above formalism and corresponding notations, we are ready to state our equivalence 
result concerning the generalized information laws. 
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Theorem 5: Suppose that / is continuous. Then an n- variable information law 

f(H(E l ),--- ,H(E k )) >0 
holds if and only if the corresponding n-variable subgroup log-index law 

f(L{Ei), • ■ ■ , L(E k )) > 

holds. 

Proof: To see one direction, namely that /(L(#i), ■ ■ • , L(E k ))> implies that f(H(Ei), ■■■ , H{E k ))> 
0, assume that there exists a set M of information elements such that f(H(Ei), ■ ■ ■ , H(E k )^j | = 
a for some a < 0. By the continuity of the function / and Theorem |3l we are guaranteed to 
be able to construct, from the information lattice generated from M, some subgroup lattice 
Lg such that the value of the function / at the normalized log-indices of the correspond- 
ingly constructed subgroups is arbitrarily close to a < 0. This contradicts the assumption that 
f(L(Ei), ■ ■ ■ , L(E k )) | > holds for all sets G of n subgroups of any group. 

On the other hand, for any normalized log-indices of the subgroups from subgroup lattices, 
it can be readily interpreted as the entropies of information elements by taking permutation 
representation for the subgroups on the subgroup lattice and then producing an information 
lattice, according to the orbit-partition-permutation-group-action correspondence. Therefore, that 
f(H{E x ),--- ,H(E k ))\ M > holds for all sets M implies that /(L(E X ),--- ,L(E k ))\ G > 
holds for all sets G. ■ 

D. Common Information Observes Neither Submodularity Nor Supermodularity Laws 

As discussed in the above, appealing to the duality between the join and the meet operations, 
one might conjecture, dual to the well-known submodularity of joint information, that common 
information would observe the supermodularity law. It turns out that common information 
observes neither the submodularity © nor the supermodularity © law — neither of the following 
two inequalities holds in general: 

h(m 12 ) + h(m 2 3) > h(m 12 3) + h(m 2 ) (6) 
h(m 12 ) + h(m 23 ) < h(m 123 ) + h(m 2 ). (7) 
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Because common information is combinatorial in flavor — it depends on the "zero pattern" of 
joint probability matrices [12] — it is hard to directly verify the validity of © and ©. However, 
thanks to Theorem [5l we are able to construct subgroup counterexamples to invalidate © and CD) 
indirectly. 

To show that © fails, it suffices to find three subgroups G X ,G 2 , and G 3 such that 

\Gi VG 2 ||G 2 VG 3 | < \G X V G 2 V G 3 \\G 2 \. (8) 

Consider G = S5, the symmetry group of order 2 5 , and its subgroups G\ = ((12345)), G2 = 
((12)(45)), and C7 3 = ((12543)). The subgroup G\ is the permutation group generated by 
permutation (12345), G 2 by (12)(45), and C7 3 by (12543). (Here, we use the standard cycle 
notation to represent permutations.) Consequently, we have GiV G 2 = ((12345), (12)(45)), 
G 2 VG 3 = ((12543), (12)(45)), and d V G 2 V C7 3 = ((12345), (12)(45), (12543)). It is easy 
to see that both G\ V G 2 and G 2 V G 3 are dihedral groups of order 10 and that Gi V G 2 V C7 3 is 
the alternative group A 5 , hence of order 60. The order of G 2 is 2. Therefore, we see that the 
subgroups Gi, G 2 , and G 3 satisfy ([8]). By Theorem[5l the supermodularity law © does not hold 
in general for common information. (Thank to Professor Eric Moorhouse for contributing this 
counterexample . ) 

Similar to the case of supermodularity, the example with G 2 = {e} and G\ = G 3 = G, \G\ 7^ 1, 
invalidates the group version of ©. Therefore, according to Theorem [51 the submodularity law © 
does not hold in general for common information either. 

VI. Discussion 

This paper builds on some of Shannon's little-recognized legacy and adopts his interesting 
concepts of information elements and information lattices. We formalize all these concepts and 
clarify the relations between random variables and information elements, information elements 
and cr-algebras, and, especially, the one-to-one correspondence between information elements 
and sample-space-partitions. We emphasize that such formalization is conceptually significant. 
As demonstrated in this paper, beneficial to the formalization carried out, we are able to establish 
a comprehensive parallelism between information lattices and subgroup lattices. This parallelism 
is mathematically natural and admits intuitive group-action explanations. It reveals an intimate 
connection, both structural and quantitative, between information theory and group theory. This 
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suggests that group theory might serve a promising role as a suitable mathematical language in 
studying deep laws governing information. 

Network information theory in general, and capacity problems for network coding specifi- 
cally, depend crucially on our understanding of intricate structures among multiple information 
elements. By building a bridge from information theory to group theory, we can now access the 
set of well-developed tools from group theory. These tools can be brought to bear on certain 
formidable problems in areas such as network information theory and network coding. Along 
these lines, by constructing subgroup counterexamples we show that neither the submodularity 
nor the supermodularity law holds for common information, neither of which is obvious from 
traditional information theoretic perspectives. 

Appendix I 
Proof of Theorem [j] 

Proof: To show two lattices are isomorphic, we need to demonstrate a mapping, from one 
lattice to the other, such that it is a lattice-morphism — it honors both join and meet operations — 
and bijective as well. Instead of proving that Lg is isomorphic to Lg directly, we show that 
the dual of L G is isomorphic to L M - Figuratively speaking, the dual of a lattice L is the lattice 
obtained by flipping L upside down. Formally, the dual lattice L' of a lattice L is the lattice 
defined on the same set with the partial order reversed. Accordingly, the join operation of the 
prime lattice L corresponds to the meet operation for the dual lattice L' and the meet operation 
of L to the join operation for L'. In the other words, we show that L G is isomorphic to L M by 
demonstrating a bijective mapping : Lg — > Lm such that 

0(GvG") =0(G)A0(G"), (9) 

and 

<p(G A G') = 4>(G) V 4>{G'), (10) 

hold for all G, G' e L G . 

Note that each subgroups on the subgroup lattice L G is obtained from the set G = {Gi : % G 
[n] } via a sequence of join and meet operations and each information element on the information 
lattice L M is obtained similarly from the set M = {m^ : i 6 [n]}. Therefore, to show that L G is 
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isomorphic to L M , according to the induction principle, it is enough to demonstrate a bijective 
mapping such that 

• (f)(Gi) = rrii, for all d E G and rrii E M; 

• For any G, G' E Lg, if 0(G) = m and 4>{G') = m', then 

<P(GVG') = mAm', and (11) 
<f)(G AG') = m\/ m! . (12) 

Naturally, we take '■ Lg Lm to be the mapping that assigns to each subgroup G E L G 
the information element identified by the coset-partition of the subgroup G. Thus, the initial step 
of the induction holds by assumption. On the other hand, it is easy to see that the mapping so 
defined is bijective simply because different subgroups always produce different coset-partitions 
and vice versa. Therefore, we are left to show that Equation (fTTI) and (fT"2l) holds. 

We first show that satisfies Equation (fTTI) . In other words, we show that the coset-partition 
of the intersection subgroup G PI G' is the coarsest among all the sample-space-partitions that are 
finer than both the coset-partitions of G and G' . To see this, let II be a sample-space-partition 
that is finer than both the coset-partitions of G and G' and n be a part of IT. Since II is finer 
than the coset-partitions of G, n must be contained in some coset C of G. For the same reason, 
7r must be contained in some coset C of G' as well. Consequently, n C C n C hold. Realizing 
that C fl C is a coset of G R G', we conclude that the coset-partition of G fl G' is coarser than IT. 
Since II is chosen arbitrary, this proves that the coset-partition of the intersection subgroup GDG' 
is the coarsest among all the sample space partitions that are finer than both the coset-partitions 
of G and G' . Therefore, Equation (fTTI) holds for 0. 

The proof for Equation (fT"2)) is more complicated. We use an idea called "transitive closure". 
Similarly, we need to show that the coset-partition of the subgroup GVG' generated from the 
union of G and G' is the finest among all the sample- space-partitions that are coarser than both 
the coset-partitions of G and G '. Let IT be a sample- space-partition that is coarser than both the 
coset-partitions of G and G' . Denote the coset partition of the subgroup G V G' by Pi. Let n be 
a part of ft. It suffices to show that n is contained in some part of II. Pick an element x from 
7f. This element x must belong to some part n of II. It remains to show n C it. In other words, 
we need to show that y E n for any y ^ x,y E ir lj . Note that n is a part of the coset-partition of 
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the subgroup G{ V Gj. In other words, tx is a coset of Gj V Gj. The following reasoning depends 
on the following fact from group theory [25]. 

Proposition 15: Two elements g± and g 2 belong to a same (right) coset of a subgroup if and 
only if gig^ 1 belongs to the subgroup. 

Since x and y belong to a same coset tx of the subgroup GVG', we have yx^ 1 E GVG'. 
Note that any element g from GVG' can be written in the form of g = aibia^ ■ ■ ■ clk^r where 
a,k E G and bk E G' for all k E [K\. Suppose yx~ l = g = ai&icti&2 • • • clk^r- We have 

y = a 1 bia 2 b 2 ■ ■ ■ a K b K x. 

In the following we shall show that y belongs to 7f by induction on the sequence ai&i • • • clk^k- 
First, we claim bxx E ff. To see this, note that x E tx. Since (pKx)x~ x = bx E G', by 

Proposition [TBI we know that fr^x and x belong to a same coset Ck of G'. By assumption, the 

partition II is coarser than the coset-partition of G', the coset Ck must be contained in ft, since 

it already contains an element x of Ck- 

For the same reason, with E n showed, we can see that (Ir^kx belongs to tx as well, 

because (dKbKx){bKx)~ l = clk EG implies a^KX and bKX belong to a same coset of G. 
Continuing the above argument inductively on the sequence a\bi ■ ■ ■ a^K, we can finally have 

a\b\ ■ ■ ■ a^KX E tx. Therefore, we have y E tx. This concludes the proof. ■ 

Appendix II 
Proof of Theorem [3] 

Proof: The approximation process is decomposed into three steps. The first step is to 
"dilate" the sample space such that we can turn a non-uniform probability space into a uniform 
probability space. The sample space partitions of the information elements are accordingly "di- 
lated" as well. After dilating the sample space, depending on the approximation error tolerance, 
i.e., e, we may need to further "amplify" the sample space. Then, we follow the same procedure 
as in Section IIII-DI and construct a subgroup lattice using the orbit-partition-permutation-group- 
action correspondence. 

We assume the probability measure P on the sample space are rational. In other words, the 
probabilities of the elementary event Pj = PrjtUj}, Ui E VL are all rational numbers, namely 
Pj = ^ for some pi, qi E N. This assumption is reasonable, because any finite dimensional real 
vector can be approximated, up to an arbitrary precision, by some rational vector. 



February 2, 2008 



DRAFT 



28 



Let M be the least common multiple of the set {g,} of denominators. We "split" each sample 
point in Vl into points. Note that is integral. We need to accordingly "dilate" the sample 
space partitions of the information elements. Specifically, for each part it of the partition of every 
information element its "dilated" partition it', in the dilated sample space Cl, contains exactly 
all the sample points that are "split" from the sample points in it. The dilated sample space Q 
has size of ^ w . eJJ To maintain the probability structure, we assign to each sample point in 
the dilated sample space (l probability i. In other words, we equip the dilated sample space 
with a uniform probability measure. It is easy to check that the entire (quantitative) probability 
structure remains the same. Thus, we can consider all the information elements as if defined on 
the dilated probability space. 

If necessary, depending on the approximation error tolerance e, we may further "amplify" the 
dilated sample space fl by K times by "splitting" each of its sample points into to K points. At 
the same time, we scale the probability of each sample point in the post-amplification sample 
space down by K times to ^j^r- By abusing of notation, we still use Vt to denote the post- 
amplification sample space. Similar to the "dilating" process, all the partitions are accordingly 
amplified. 

Before we move to the third step, we compute entropies for information elements in terms 
of the cardinality of the parts of its dilated sample space partition. Consider an information 
element m,. Denote its pre-dilation sample space partition by IT = {tt^, j E [J]} and its post- 
amplification sample space partition by n,; = {rr^j e [J]}. It is easy to see that the entropy 
H{rrii) can be calculated as follows: 

ir(m i ) = -^Pr{7rf'}logPr{7tf} 

ie[J] 

-- Prolog Pr{7rf} (13) 



tt:.l 1*51 



All the entropies of the other information elements, including the joint and common information 
elements, on the entire information lattices can be computed in the exactly same way in terms 
of the cardinalities of the parts of their dilated sample space partitions. 

In the third step, we follow the same procedure as in Section IIII-D I and construct, based on 
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the orbit-partition-permutation-group-action correspondence, a subgroup lattice that isomorphic 
to the information lattice generated by the set of information elements {rrii : i E [n]}. More 
specifically, the subgroup lattice is constructed according to their "post-amplification" sample 
space partitions. 

Suppose, on the constructed subgroup lattice, the permutation groups Gi corresponds to the 
information element mj. As in the above, the "post-amplification" sample space partition of rrii 
is Ili = {itj, j E [J]}. Then, the cardinality of the permutation group is simply 

jeJ 

According to the isomorphism relation established in Theorem [2l the above calculations remain 
valid for all the subgroups on the subgroup lattices. 

Recall that all the groups on the subgroup lattice are permutation groups and are all subgroups 
of the symmetry group of order So the log-index of Gi, corresponding to rrii, is 

\Ci\\ 

lQg ^ = lQg TT —I ' (14) 

IN II,. / ~;- 

As we see from Equation © and © of Proposition [7J the entropies of the coset-partition 
information elements on information lattices equal exactly the log-indices of their subgroups 
on subgroup lattices. However, for the information lattice generated from general information 
elements, namely information elements with non-equal sample space partitions, as we see from 
Equation (fl3l) and (fl4)) . the entropies of the information elements on the information lattice does 
not equal the log-indices of their corresponding permutation groups on the subgroup lattices 
exactly any more. But, as we can shall see, the entropies of the information elements are well 
approximated by the log-indices of their corresponding permutation groups. Recall the following 
Stirling's approximation formula for factorials: 

logn! = n \ogn — n + o{n). (15) 

"Normalizing" the log-index in Equation (fl4l) by a factor -i- and then substituting the factorials 
with the above Stirling approximation formula, we get 

ilog^i = -^-{\(l\\og\(l\ - 
|Q| | Gi| |Q|V 

(Ei^i lo gi^i-i^i)+°(^i))- 

ie[J] 
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Note that in the above substitution process, we combined some finite o(\£l\) terms "into" one 
o(|ft|) term. 

It is clear that J2je[J] = 1^1' smce {^1 '■ 3 e [J]} forms a partition of Q. Therefore, we 
get 



i , \n\ 

W\ l ° g \G- 



^(|n|io g |n|-X;i^|iog|7rf'| + (n)) 



h(m,i) + 



am 

\Ci\ 



So, the difference between the entropy H(m,i) and the normalized log-index of its corresponding 
permutation subgroup Gj diminishes for Cl large. 

Since both the entropy vector /i M and the log-index vector I g n are of finite dimension, it 
follows easily 



M 



with 



N 



^ j\fp. 

N — \Q\ — K — > oo, by taking K — > oo. 



o, 



This concludes the proof. 
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